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Cloud-based systems implies to applications, resources or services provided to 
users as per their requirement through the Internet using a cloud computing 
provider’s server. These clouds triggers alarm events to indicate the health of 
system. Monitoring these alarms is essential for maintaining the health and 
continuous functioning of cloud. Because of humungous alarms triggered on 
daily basis, notifying critical alarms in time and taking required action is quite 


challenging task. In this paper machine learning model is implemented using 
decision tree classifier to analyse each alarm and predict if any action required 
for that alarm or not and also notify the concerned team via creating JIRA 


tickets. 


1. Introduction 


Cloud-based systems implies to applications, 
resources or services provided to users as per their 
requirement through the Internet using a cloud 
computing provider’s server (Anand). Companies 
make use of cloud-based computing to enhance 
capacity, improve functionality or add additional 
services on demand without investing on expensive 
infrastructure or spend on training of existing 
support staff. customers storage or software offered 
via private or public cloud by the service provider. 
Industrial machines have various alarms are embed- 
ded in machine controllers. By employing sensors 
and machine states to notify to end-users or to keep 
machines in a specific mode. In particular, sensor 
data is compared with some predefined threshold 
values in machine controller and the alarms are 
triggered frequently (Agrawal et al.). The root 
causes of system misbehaviour can be detected by 
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analysing alarm logs and the problems can be fixed. 
Because of enormous amount of the system log, 
detecting critical alarms in time and tracing the key 
reason of system faults turned out be a complex 
challenge in enhancing the durability of telecom- 
munication network systems and compromising the 
quality of customer service (Yuan et al.). 


The usual way is to avoid system failures in a 
reactive way is when an internal fault is detected, 
a monitoring agent triggers a recovery procedure to 
reduce the problem and a human operator is alerted. 
But this method carried out after a fault has hap- 
pened, which may need some extra time until it 
is notified. when the recovery procedure initiates, 
the fault may have caused some harm to the sys- 
tem. Alarm events are the indication of defect that 
occurred by malfunctioning of hardware or software 
or false operations or users (Yu). Alarm data con- 
tains information about fault diagnosis and recov- 
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ery. Thus, handling of alarm data has prominent 
impact on operation price and quality of services in 
telecommunication industry. Faults or unexpected 
events are unavoidable in critical and advanced sys- 
tems (Bhowmik, Chandana, and Rudra). 

Proactive failure detection 1s way to detect events 
prior so that the preventative or recovery mea- 
sures can be planned to improve system availabil- 
ity (Adamu et al.). Machine learning algorithms 
applied in different areas of research and resulted 
in fine performance in learning and understanding 
the patterns (Wong and Yeh). In case of proactive 
failure detection, assumption is made prior to occur- 
rence of failure, few parameters of the system can 
reveal signs of the approaching failure. When these 
data are collected and analysed by the algorithms, 
specific characteristics of the system in healthy and 
faulty states can be adjusted during the training stage 
and identified at runtime (Vrana and Korenek Sanzo, 
Avresky, and Pellegrini). Machine learning tech- 
niques are proved to be suitable to find out patterns 
from datasets available and to categorize class of a 
new sample of knowledge belongs. Nokia’s UCIM 
(Unified Cloud Infrastructure management) tool is 
used for monitoring of telco clouds located in dif- 
ferent locations. alarms triggered by this cloud can 
be viewed in UCIM tool. Analysing each alarm 
manually is time consuming and takes huge manual 
effort (Sun et al.). In this paper machine learning 
model is built using decision tree classifier to watch 
the cloud alarms in efficient way and notify the con- 
cerned team/person via incident creation 


2. Methodology 
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FIGURE 1. Dataflow Diagram of Proposed 
Method 
Figure 1. shows the Dataflow Diagram of Pro- 


posed Methodology followed, when a new alarm 
comes in UCIM tool, collect the alarm from the 
database and filter out the alarms for different cat- 
egory of alarms and prepare data set for training. 
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analyse the data and carryout pre-processing like 
balancing the data, removing the duplicates etc. 
Once the data is ready model needs to be chosen. 
Three algorithms such as Decision tree classifier, 
random forest algorithm and naive bayes classifier 
are chosen at this initial phase, whichever algorithm 
gives the highest accuracy will be finalized for the 
deployment. Model will pick all the active alarms 
in real time and analyse the root cause of the alarm 
to suggest the work around. An Internal JIRA ticket 
will be created with respective to the cloud located 
lab for the quick action. 


For implementation, three separate modules are 
considered, Input module includes the data collec- 
tion and pre-processing of data. Building a model 
module includes evaluation of algorithms, model 
selection and implementation of model. Notification 
module is responsible for JIRA creation. 


3. Implementation 


Main technologies used in this implementation is 
Python, SQL, Scikit-learn, REST APIs. 


Python is general purpose, versatile programming 
language. It has multiple libraries which can be used 
in building machine learning algorithm. 


e Rest API Representational State Transfer appli- 
cation programming interface. When a client 
request is made via a REST API, it transfers a repre- 
sentation of the state of the resource to the requester 
or endpoint. 


e Scikit-learn is a free software machine learning 
Library for the Python programming language. It 
features various classification and regression algo- 
rithms. 


Cloud alarms are collected from UCIM tool via 
Rest APIs and stored in an external database. Pre- 
processing refers to the conversions applied on data 
prior to use of them the in algorithm. Pre-processing 
is method conversion of the raw data into polished or 
clean data set. When data is collected from different 
sources, the format may not be same or may not be 
in a desired format. It may also contain duplicate 
values and redundant values and some algorithm 
will not take null values and in case of classification 
algorithms training data needs to have balanced data 
for each category. Hence data should be processed 
to be unique, balanced and correct before using the 
data in machine learning methods for training and 
testing. Since response of Rest API is in JSON for- 
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mat, itis converted to CSV file using python compo- 
nent. The pre-processing includes removal of miss- 
ing values, removal of outliers, data visualization, 
data transformation, balancing of data etc. Figure 2. 
shows Use case diagram of input module. 
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FIGURE 2. Use case diagram of input module 


Feature selection is identifying and choosing the 
input features that are most relevant to the target 
variable. Selection of feature is done by evaluat- 
ing feature importance of each feature by assign- 
ing scores to input features. Importance indicates 
the relative importance of every feature when decid- 
ing a prediction. After evaluating all the features 
available with cloud alarms such as alarm ID, name, 
description, timestamp, rack and location four fea- 
tures name, description, alarm ID and rack are con- 
sidered for the training. The data considered for 
training the model is analyzed for its correctness, 
uniqueness and balance. 

In order to choose the suitable algorithm for 
building the model three algorithms such as decision 
tree, random forest and naive Bayes classifier are 
evaluated for its accuracy of its prediction. Cloud 
alarms are collected and accuracy of three classifiers 
is tested for one week period and Average Accuracy 
obtained by different Algorithm results are tabulated 
in Table 1. 


TABLE 1. Average Accuracy obtained by differ- 
ent Algorithms 


Algorithms Average Accuracy (%) 
Decision-Tree 98 

Naive Bayes 85 
Random forest 96 


Once the results predicted by the model are 
obtained, analysis of cloud alarms are carried out 
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and report is generated. Referring to the report if 
the alarm is critical then the notification is sent to 
the respective authority. Before creating the ticket, 
need to check whether the rack or cloud in main- 
tenance mode or any planned activity scheduled so 
that unnecessary ticket can be reduced. Creation 
of Jira incident/ticket is done by using python Jira. 
Client module. These tickets contain alarm ID loca- 
tion, description of alarms 


4. Result 


Accuracy(%) 


90 
- ia 
70 


Decision tree Random forest Naive Bayes 


@ Accuracy(%) 
FIGURE 3. Plot of accuracy of three models 


TABLE 2. Average Accuracy obtained by the 
Model 


Week Average Accuracy (%) 
week | §3 
week 2 85 
week 3 90 
week 4 95 


In this paper three supervised learning algorithms 
were evaluated for accuracy of cloud alarm predic- 
tion. Figure shows the average accuracy of predic- 
tion of model with cloud alarms collected for one 
week period using three supervised learning algo- 
rithms such as Random Forest, Naive Bayes and 
Decision tree classifier. Decision tree algorithm 
gave higher accuracy of 98%.Figure 3. Reflects the 
plot of accuracy of three models and Table 2. shows 
the Average Accuracy obtained by the Model. 

Model is implemented with decision tree classi- 
fier and prediction accuracy is monitored for four 
weeks with live data and tabulated in table II. Cloud 
Alarm is fed model and classified as one of the 
labelled outputs as O-Action to be taken and 1-No 
action required as shown in Table 3. as Output pre- 
diction 
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TABLE 3. Output prediction 


Alarm Description Name _ Severit Time Stamp 
ID 


31xxx The state Service Critical 2020-03- 


of a ser- fail- 20T 13:56:22. 
vice is ure 
failed 115Z 

31xxx The state Cluster Major 2019-02- 
of the degraded 16T07:17:21. 
cluster is 
degraded 645Z 


5. Conclusion 


This paper evaluates three supervised machine 
learning algorithms such as random forest, decision 
tree and Naive Bayes classifier with available alarm 
events collected. Decision tree classifier is chosen 
as suitable application for this specific requirement 
after accuracy of different algorithms are evaluated 
and compared. This model gathers active alarms 
from clouds via Rest APIs and predicts whether 
action is required or not and creates a JIRA tickets to 
concerned team with the accuracy 95.73%. By using 
this model, manual effort of analysing each alarm 1s 
reduced and also alarms generated by clouds which 
are in maintenance mode or scheduled for planned 
maintenance can be excluded which in turn reduce 
burden on analysis and ticket creation 
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